59 research outputs found

    Boosting Drug Named Entity Recognition using an Aggregate Classifier

    Get PDF
    AbstractObjectiveDrug named entity recognition (NER) is a critical step for complex biomedical NLP tasks such as the extraction of pharmacogenomic, pharmacodynamic and pharmacokinetic parameters. Large quantities of high quality training data are almost always a prerequisite for employing supervised machine-learning techniques to achieve high classification performance. However, the human labour needed to produce and maintain such resources is a significant limitation. In this study, we improve the performance of drug NER without relying exclusively on manual annotations.MethodsWe perform drug NER using either a small gold-standard corpus (120 abstracts) or no corpus at all. In our approach, we develop a voting system to combine a number of heterogeneous models, based on dictionary knowledge, gold-standard corpora and silver annotations, to enhance performance. To improve recall, we employed genetic programming to evolve 11 regular-expression patterns that capture common drug suffixes and used them as an extra means for recognition.MaterialsOur approach uses a dictionary of drug names, i.e. DrugBank, a small manually annotated corpus, i.e. the pharmacokinetic corpus, and a part of the UKPMC database, as raw biomedical text. Gold-standard and silver annotated data are used to train maximum entropy and multinomial logistic regression classifiers.ResultsAggregating drug NER methods, based on gold-standard annotations, dictionary knowledge and patterns, improved the performance on models trained on gold-standard annotations, only, achieving a maximum F-score of 95%. In addition, combining models trained on silver annotations, dictionary knowledge and patterns are shown to achieve comparable performance to models trained exclusively on gold-standard data. The main reason appears to be the morphological similarities shared among drug names.ConclusionWe conclude that gold-standard data are not a hard requirement for drug NER. Combining heterogeneous models build on dictionary knowledge can achieve similar or comparable classification performance with that of the best performing model trained on gold-standard annotations

    Visual identification of individual Holstein-Friesian cattle via deep metric learning

    Get PDF
    Holstein-Friesian cattle exhibit individually-characteristic black and white coat patterns visually akin to those arising from Turing's reaction-diffusion systems. This work takes advantage of these natural markings in order to automate visual detection and biometric identification of individual Holstein-Friesians via convolutional neural networks and deep metric learning techniques. Existing approaches rely on markings, tags or wearables with a variety of maintenance requirements, whereas we present a totally hands-off method for the automated detection, localisation, and identification of individual animals from overhead imaging in an open herd setting, i.e. where new additions to the herd are identified without re-training. We propose the use of SoftMax-based reciprocal triplet loss to address the identification problem and evaluate the techniques in detail against fixed herd paradigms. We find that deep metric learning systems show strong performance even when many cattle unseen during system training are to be identified and re-identified - achieving 98.2% accuracy when trained on just half of the population. This work paves the way for facilitating the non-intrusive monitoring of cattle applicable to precision farming and surveillance for automated productivity, health and welfare monitoring, and to veterinary research such as behavioural analysis, disease outbreak tracing, and more. Key parts of the source code, network weights and underpinning datasets are available publicly.Comment: 37 pages, 14 figures, 2 tables; Submitted to Computers and Electronics in Agriculture; Source code and network weights available at https://github.com/CWOA/MetricLearningIdentification; OpenCows2020 dataset available at https://doi.org/10.5523/bris.10m32xl88x2b61zlkkgz3fml1

    mzMLb: A Future-Proof Raw Mass Spectrometry Data Format Based on Standards-Compliant mzML and Optimized for Speed and Storage Requirements

    Get PDF
    With ever-increasing amounts of data produced by mass spectrometry (MS) proteomics and metabolomics, and the sheer volume of samples now analyzed, the need for a common open format possessing both file size efficiency and faster read/write speeds has become paramount to drive the next generation of data analysis pipelines. The Proteomics Standards Initiative (PSI) has established a clear and precise extensible markup language (XML) representation for data interchange, mzML, receiving substantial uptake; nevertheless, storage and file access efficiency has not been the main focus. We propose an HDF5 file format "mzMLb" that is optimized for both read/write speed and storage of the raw mass spectrometry data. We provide an extensive validation of the write speed, random read speed, and storage size, demonstrating a flexible format that with or without compression is faster than all existing approaches in virtually all cases, while with compression is comparable in size to proprietary vendor file formats. Since our approach uniquely preserves the XML encoding of the metadata, the format implicitly supports future versions of mzML and is straightforward to implement: mzMLb's design adheres to both HDF5 and NetCDF4 standard implementations, which allows it to be easily utilized by third parties due to their widespread programming language support. A reference implementation within the established ProteoWizard toolkit is provided

    Cardiovascular magnetic resonance tagging of the right ventricular free wall for the assessment of long axis myocardial function in congenital heart disease

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Right ventricular ejection fraction (RV-EF) has traditionally been used to measure and compare RV function serially over time, but may be a relatively insensitive marker of change in RV myocardial contractile function. We developed a cardiovascular magnetic resonance (CMR) tagging-based technique with a view to rapid and reproducible measurement of RV long axis function and applied it in patients with congenital heart disease.</p> <p>Methods</p> <p>We studied 84 patients: 56 with repaired Tetralogy of Fallot (rTOF); 28 with atrial septal defect (ASD): 13 with and 15 without pulmonary hypertension (RV pressure > 40 mmHG by echocardiography). For comparison, 20 healthy controls were studied. CMR acquisitions included an anatomically defined four chamber cine followed by a cine gradient echo-planar sequence in the same plane with a labelling pre-pulse giving a tag line across the basal myocardium. RV tag displacement was measured with automated registration and tracking of the tag line together with standard measurement of RV-EF.</p> <p>Results</p> <p>Mean RV displacement was higher in the control (26 ± 3 mm) than in rTOF (16 ± 4 mm) and ASD with pulmonary hypertension (18 ± 3 mm) groups, but lower than in the ASD group without (30 ± 4 mm), P < 0.001. The technique was reproducible with inter-study bias ± 95% limits of agreement of 0.7 ± 2.7 mm. While RV-EF was lower in rTOF than in controls (49 ± 9% versus 57 ± 6%, P < 0.001), it did not differ between either ASD group and controls.</p> <p>Conclusions</p> <p>Measurements of RV long axis displacement by CMR tagging showed more differences between the groups studied than did RV-EF, and was reproducible, quick and easy to apply. Further work is needed to assess its potential use for the detection of longitudinal changes in RV myocardial function.</p

    Diagnostic MALDI-TOF MS can differentiate between high and low toxic Staphylococcus aureus bacteraemia isolates as a predictor of patient outcome

    Get PDF
    Staphylococcus aureus bacteraemia (SAB) is a major cause of blood-stream infection (BSI) in both healthcare and community settings. While the underlying comorbidities of a patient significantly contributes to their susceptibility to and outcome following SAB, recent studies show the importance of the level of cytolytic toxin production by the infecting bacterium. In this study we demonstrate that this cytotoxicity can be determined directly from the diagnostic MALDI-TOF mass spectrum generated in a routine diagnostic laboratory. With further development this information could be used to guide the management and improve the outcomes for SAB patients
    corecore